Skip to content

Test FLINT before building msolve#329

Merged
vneiger merged 1 commit intoalgebraic-solving:masterfrom
wegank:flint-test
Apr 9, 2026
Merged

Test FLINT before building msolve#329
vneiger merged 1 commit intoalgebraic-solving:masterfrom
wegank:flint-test

Conversation

@wegank
Copy link
Copy Markdown
Contributor

@wegank wegank commented Mar 28, 2026

No description provided.

@wegank wegank changed the title Test FLINT before building msolve [DONTMERGE] Test FLINT before building msolve Mar 28, 2026
@wegank wegank force-pushed the flint-test branch 2 times, most recently from a8b9edf to 776c081 Compare March 28, 2026 22:12
@wegank wegank changed the title [DONTMERGE] Test FLINT before building msolve Test FLINT before building msolve Mar 28, 2026
@wegank wegank marked this pull request as ready for review March 28, 2026 22:31
@jerebertho
Copy link
Copy Markdown
Contributor

jerebertho commented Apr 8, 2026

Hi,

Many thanks! Reruning the jobs, one of them crashed during Test FLINT because of an illegal instruction. It would be interesting to modify the CI so that the tests continue, in particular Build and Test msolve.

We would then be able to see if the failures of Build and Test msolve, due to an illegal instruction that we observed lately, happen exactly when Test FLINT fails for the same reason.

@wegank wegank force-pushed the flint-test branch 5 times, most recently from a036cb2 to 16dadb6 Compare April 8, 2026 23:01
@wegank
Copy link
Copy Markdown
Contributor Author

wegank commented Apr 8, 2026

I've added continue-on-error: true for the FLINT tests.

Now, in https://github.com/algebraic-solving/msolve/actions/runs/24163125530, both the FLINT and msolve tests fail with Illegal instruction.

@jerebertho
Copy link
Copy Markdown
Contributor

Thanks, this look good to me!

Now we need to know why we get these illegal instructions though...

@wegank
Copy link
Copy Markdown
Contributor Author

wegank commented Apr 9, 2026

I may have a clue: FLINT identifies the processor as Zen 4 in the failing run, while it reports Zen 3 in all others. I strongly doubt the processor is actually consistently Zen 3.

@wegank
Copy link
Copy Markdown
Contributor Author

wegank commented Apr 9, 2026

Well, AMD EPYC 9V74 80-Core Processor sounds indeed Zen 4. Probably a general Zen 4 bug on the FLINT side?

@vneiger
Copy link
Copy Markdown
Contributor

vneiger commented Apr 9, 2026

Well, AMD EPYC 9V74 80-Core Processor sounds indeed Zen 4. Probably a general Zen 4 bug on the FLINT side?

The laptop I use most of the time is zen4 and I face no general issue with FLINT. Wrong detection of architecture can happen though, but you say that here this is indeed zen4? Then there is something fishy because in the failing run I see that msolve has not detected availability of avx512 flags, whereas it should (if indeed zen4).

@vneiger
Copy link
Copy Markdown
Contributor

vneiger commented Apr 9, 2026

I'm hinting at, for example:

checking whether AVX512-F is supported by the processor... no

this should be "yes" for zen4, as far as I know.

@wegank
Copy link
Copy Markdown
Contributor Author

wegank commented Apr 9, 2026

Here's the output of lscpu, where I don't see AVX512:

Architecture:                            x86_64
CPU op-mode(s):                          32-bit, 64-bit
Address sizes:                           48 bits physical, 48 bits virtual
Byte Order:                              Little Endian
CPU(s):                                  4
On-line CPU(s) list:                     0-3
Vendor ID:                               AuthenticAMD
Model name:                              AMD EPYC 9V74 80-Core Processor
CPU family:                              25
Model:                                   17
Thread(s) per core:                      2
Core(s) per socket:                      2
Socket(s):                               1
Stepping:                                1
BogoMIPS:                                5192.26
Flags:                                   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves user_shstk clzero xsaveerptr rdpru arat npt nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload umip vaes vpclmulqdq rdpid fsrm
Virtualization:                          AMD-V
Hypervisor vendor:                       Microsoft
Virtualization type:                     full
L1d cache:                               64 KiB (2 instances)
L1i cache:                               64 KiB (2 instances)
L2 cache:                                2 MiB (2 instances)
L3 cache:                                32 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-3
Vulnerability Gather data sampling:      Not affected
Vulnerability Ghostwrite:                Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Old microcode:             Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Vulnerable: Safe RET, no microcode
Vulnerability Spec store bypass:         Vulnerable
Vulnerability Spectre v1:                Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; Retpolines; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Vulnerable: No microcode
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

@vneiger
Copy link
Copy Markdown
Contributor

vneiger commented Apr 9, 2026

Odd situation, difficult to find official information, that one is not in the tables on AMD's website. Maybe it is some special product, the core number is unusual as well. I guess we should open an issue in flint so that the specific cpuid is linked to zen3 (thus disabling avx512 pieces of code)?

@wegank wegank marked this pull request as draft April 9, 2026 13:55
@wegank wegank force-pushed the flint-test branch 3 times, most recently from 166ea5e to 1b25321 Compare April 9, 2026 16:01
@wegank
Copy link
Copy Markdown
Contributor Author

wegank commented Apr 9, 2026

I made a dirty patch for FLINT that downgrades modelstr to x86_64 if CPUID.0x7.0:EBX.AVX512F does not report 1, and now msolve tests pass on AMD EPYC 9V74. The CI failure is unrelated, though.

@vneiger
Copy link
Copy Markdown
Contributor

vneiger commented Apr 9, 2026

I made a dirty patch for FLINT that downgrades modelstr to x86_64 if CPUID.0x7.0:EBX.AVX512F does not report 1, and now msolve tests pass on AMD EPYC 9V74.

If feasible, going to a less generic target, you could use
./configure --host=zen3-pc-linux-gnu --build=zen3-pc-linux-gnu
in the build of flint (under the condition that you mention, i.e. "CPUID... does not report 1"). That should be enough to circumvent the avx512 issue.

The CI failure is unrelated, though.

Oops.

@wegank
Copy link
Copy Markdown
Contributor Author

wegank commented Apr 9, 2026

Yeah, downgrading modelstr to zen3 probably suffices. I've opened an issue in the FLINT repo in the meantime.

@wegank wegank marked this pull request as ready for review April 9, 2026 21:04
@vneiger
Copy link
Copy Markdown
Contributor

vneiger commented Apr 9, 2026

LGTM . Thanks for locating the config issue and helping to fix it!

@vneiger vneiger merged commit 156fe8a into algebraic-solving:master Apr 9, 2026
6 checks passed
@wegank wegank deleted the flint-test branch April 9, 2026 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants